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Abstract-  Early  diagnosis  and  removal  of  colonic  polyps  is 
effective  in  the  elimination  of  subsequent  carcinoma.  This  paper 
presents  a  new  approach  for  computer-aided  detection  of 
polyps.  The  approach  mimics  the  way  the  radiologists  view  CT 
abdomen  images  and  utilizes  several  geometric  attributes 
obtained  from  many  triples  of  mutually  orthogonal  planes.  The 
histogram  of  the  attributes  obtained  from  a  sufficiently  large 
number  of  perpendicular  random  images  serves  as  a  robust 
signature  to  represent  the  shape.  We  combine  the  new  3-D 
pattern  recognition  with  a  support  vector  machine  classifier, 
and  show  that  the  number  of  the  false  positive  detections  in  the 
initial  polyp  detection  studies  can  be  substantially  reduced.  One 
of  the  main  contributions  of  this  study  is  the  thorough  analysis 
of  planar  geometrical  attributes.  When  an  appropriate 
combination  of  planar  attributes  is  used,  the  false  positive  rate  is 
reduced  by  87  percent  beyond  that  of  the  initial  stage  detector, 
while  maintaining  a  sensitivity  level  of  95  percent.  Using  such 
methods,  radiologists  should  be  able  to  view  CTC  data  much 
more  efficiently  and  accurately  than  without  CAD. 

Keywords  -  Computer  aided  diagnosis,  vector  quantization, 
support  vector  machine,  polyp  detection 

I.  Introduction 

Colon  cancer  is  the  second  leading  cause  of  cancer  deaths 
in  the  USA  [1].  Previous  research  has  shown  that 
adenomatous  polyps  have  a  high  probability  of  developing 
into  subsequent  colorectal  carcinoma  [2].  Detection  and 
removal  of  pre-cancerous  polyps  can  prevent  eventual  cancer 
development.  As  such,  a  cost-effective  and  patient- 
comfortable  screening  procedure  is  desirable  in  order  to 
diagnose  the  disease  in  an  earlier  stage. 

CT  colonoscopy  (CTC)  is  a  recently  developed,  non- 
invasive  screening  method  that  combines  spiral  CT  data 
acquisition  of  the  air-filled  and  cleansed  colon  with  3- 
dimensional  imaging  software  to  create  endoscopic  images  of 
the  colonic  surface  [9].  While  initial  results  are  promising,  the 
method  is  limited  partly  due  to  the  extensive  amount  of 
radiologist  time  involved  in  the  interpretation  process. 
Therefore,  an  automated  computer-aided  detection  method 
for  polyps  is  necessary  to  increase  efficiency  prior  to  the 
widespread  use  of  CTC  for  screening. 

Automated  polyp  detection  is  a  new,  but  rapidly  growing 
area  of  research.  The  problem  of  identifying  colonic  polyps  is 
very  challenging  because  they  come  in  various  sizes  and 
shapes,  and  because  thickened  folds  and  retained  stool  may 
mimic  their  shape  and  density.  Fig.  1  gives  examples  of  the 
similar  shapes  rendered  from  polyps  and  normal  tissue.  Initial 
studies  concerning  automated  polyp  detection  have  been 
based  on  analysis  of  3D  shape.  In  [4],  Summers  et  al. 
computed  the  minimum,  maximum,  mean  and  Gaussian 
curvatures  at  all  points  on  the  colon  wall.  In  [22],  Yoshida  et 
al  use  curvedness  to  distinguish  polyps  from  healthy  tissue. 
In  [5],  Paik  et  al  introduced  a  method  based  on  the  concept 


that  normals  to  the  colon  surface  will  intersect  with 
neighboring  normals  depending  on  the  local  curvature 
features  of  the  colon.  In  [7],  Gokturk  and  Tomasi  designed  a 
method  where  a  sphere  is  fit  locally  to  the  isodensity  surface 
passing  through  every  CT  voxel  in  the  wall  region  and 
densely  populated  nearby  sphere  centers  are  considered  as 
polyp  candidates. 

Due  to  the  large  number  of  false  positive  detections,  all  of 
the  methods  mentioned  above  can  be  considered  more  as 
polyp  candidate  detectors  than  polyp  detectors.  This  paper 
presents  a  statistical  method  to  differentiate  between  polyps 
and  normal  tissue  amongst  the  candidate  polyps.  The  input  to 
the  system  is  a  set  of  small  candidate  volumes,  derived  from 
one  of  the  methods  just  discussed.  Our  volume  processing 
technique  attempts  to  mimic  the  way  radiologists  view  these 
shapes  and  generates  shape- signatures  for  each  candidate 
volume.  The  signatures  are  then  fed  to  a  support  vector 
machine  (SVM)  classifier  for  the  final  classification  of  the 
candidate  volume. 

The  paper  is  organized  as  follows:  Section  2  describes  our 
volume  processing  method  in  detail.  Section  3  explains  the 
experimental  setup  and  describes  results  from  initial  clinical 
applications.  Section  4  describes  our  early  conclusions  and 
possible  directions  for  future  work. 
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Fig.  1 :  Virtual  endoscopic  reconstructions  from  CTC  data  showing  (a-c)  true 
polyps,  (d)  a  normal  thickened  fold  (e)  retained  stool. 
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Fig.  2:  (a)  Two  different  volume  renderings  of  a  polyp,  (b-d)  Three  mutually 
orthogonal  tomographic  planes  through  the  same  volume. 

II.  Methodology 

Many  radiologists  prefer  to  view  colon  CT  images  using 
three  perpendicular  image  planes  aligned  with  the  transaxial, 
sagittal,  and  coronal  anotomical  directions  [3].  Our  approach 
takes  advantage  of  this  observation.  To  be  more  accurate,  we 
collect  statistics  over  several  random  triples  of 
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Fig.  3:  Overview  of  the  3-D  pattern  processing  approach. 


mutually  orthogonal  planes  rather  than  using  only  the 
principal  directions  (Fig.  2).  Fig.  3  gives  the  overview  of  our 
volume  processing  approach.  Having  obtained  many  triples 
of  planes,  geometric  attributes  are  obtained  for  each  random 
plane.  We  use  a  histogram  of  these  geometric  attributes 
obtained  from  each  triple  as  a  feature- vector  to  represent  the 
shape.  Taking  histograms  of  these  geometric  attributes  over 
several  random  triples  makes  the  resulting  signatures 
invariant  to  rotations  and  translations.  More  details  on  such 
"signatures"  are  given  in  Sections  A  and  B.  Support  vector 
machines,  as  described  in  Section  C,  are  then  trained  with 
signatures  computed  from  an  initial  set,  and  are  subsequently 
used  to  classify  new  candidate  volumes  as  containing  polyps 
or  normal  tissue. 

A.  Image  Processing 

Once  each  candidate  volume  is  sliced  with  several  triples 
of  perpendicular  planes,  the  necessary  region  on  the  slice 
needs  to  be  segmented.  On  each  slice,  a  polyp  may  not 
occupy  the  resulting  image  entirely.  As  a  consequence, 
images  are  segmented,  so  as  to  disregard  tissues  surrounding 
the  putative  polyp.  By  segmentation,  we  aim  to  discover  the 
best  square  window  that  would  capture  the  essentials  of  the 
shape  as  demonstrated  on  Fig.  4.  We  parameterize  the 
segmentation  task  as  an  optimization  problem  for  searching 
the  optimum  window  size  [3].  Shape  and  intensity  attributes 
are  computed  in  the  resulting  optimum  sub- window. 


Fig.  5.  (a)  Sample  image,  (b)  Edges,  (c)  Gaussian  mask  to  weight  the  edge 
points  on  the  image,  (d)  Circle,  (e)  ellipse,  (f)  line  fitted  to  the  weighted  edge 
points. 


The  geometric  attributes  of  the  image  should  capture 
representative  information  about  the  candidate  shape.  In  the 
following  paragraphs,  we  describe  the  feature  vectors  derived 
to  describe  these  geometric  attributes.  Primitive  shapes  such 
as  circle,  ellipse,  line  and  parallel  set  of  lines  (Fig.  4.  d,e,f,g) 
are  fit  to  the  largest  connected  edge  component,  i.e.  the 
boundary  of  the  shape. 

A  random  slice  of  a  sphere  is  a  circle.  Thus,  fitting  circles 
is  a  means  of  measuring  the  sphericity  of  the  3-D  shape. 
While  doing  so,  we  first  mask  the  boundary  points  by  a 
Gaussian  located  at  the  center  of  the  image  in  order  to  give 
more  importance  to  the  boundary  points  that  are  closer  to  the 
center  of  the  image.  Next,  the  least  squares  solution  is  found 
to  minimize  the  residual  of  the  Gaussian  weighted  circle.  The 
residual  to  the  least  square  solution  is  recorded  as  well. 

Similarly,  the  residual  to  the  optimum  fitting  line  gives 
information  on  the  flatness  of  the  surface.  Quadratic  curves 
include  any  second  order  equation  system  of  two  variables. 
By  fitting  a  quadratic  curve  to  the  boundary  of  the  image,  the 
ellipsoidal  structure  of  the  shape  can  be  measured,  thereby 
helping  to  capture  similarity  to  a  pedunculated  polyp. 

The  projection  of  a  fold  onto  the  image  plane  contains 
parallel  lines.  In  order  to  capture  this  structure,  we  apply 
parallel  lines  analysis,  which  includes  fitting  lines  to  the  two 
largest  connected  components  of  the  boundary  points.  The 
residual  to  these  lines  and  the  angle  between  the  parallel  lines 
are  also  recorded  as  a  feature  vector. 

In  order  to  extract  information  regarding  the  higher  order 
frequency  characteristics  of  the  boundary,  3rd  order  moment 
invariants  are  computed  as  well  [10].  This  gives  information 
regarding  the  curviness  of  the  boundary  points.  In  addition  to 
these  shape  features,  intensity  features  are  extracted  from  the 
tissue  part  of  the  image.  These  features  include  the  mean  and 
and  standard  deviation  of  the  intensity  of  the  tissue  in  the 
volume. 

All  the  attributes  mentioned  so  far  are  calculated  for  each 
random  triple  of  images.  The  three  images  in  each  triple  are 
sorted  in  the  order  of  increasing  radius  of  curvature,  and  the 
features  above  are  listed  into  a  vector  in  this  order.  Often,  one 
plane  out  of  the  three  planes  does  not  contain  useful 
geometric  information,  i.e.  it  might  represent  uniform  tissue 
or  air-containing  voxels.  In  these  cases,  features  from  the 
non-contributing  plane  are  not  considered  further.  The 
remaining  vector  represents  the  signature  of  the  candidate 
volume  relative  to  that  particular  triple  of  perpendicular 
planes. 
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Fig.  6:  The  results  obtained  with  different  planar  attributes 


B.  Vector  Quantization 

The  features  computed  from  each  triple  of  perpendicular 
planes  depend  on  the  position  and  orientation  of  that 
particular  triple.  However,  if  histograms  of  feature 
distributions  are  computed  from  sufficient  numbers  of  triples 
with  random  positions  and  orientations,  the  histograms 
themselves  become  invariant  to  position  and  orientation. 
Representing  the  histograms  of  these  attribute  vectors  is  non¬ 
trivial,  as  uniformly  oriented  histogram  bin  centers  are  not 
possible  due  to  the  high  dimensional  space  of  the  underlying 
vectors.  An  alternative  method  is  necessary  to  obtain 
histogram  bin  centers  that  would  represent  the  data 
accurately.  We  use  vector  quantization  for  this  purpose.  First, 
the  representative  histogram  bin  centers  are  obtained  using  a 
k- means  clustering  algorithm[ll].  This  algorithm  starts  with 
a  random  selection  of  feature  vectors  as  histogram  bin  centers 
and  iterates  to  find  the  optimum  bin  centers.  Having  obtained 
the  bin  centers,  each  feature  vector  partitions  a  unit  vote  into 
fractions  that  are  inversely  proportional  to  the  vector’s 
distances  to  all  cluster  centers.  The  histograms  thus  obtained, 
one  per  candidate  volume,  are  the  final  shape  signatures  used 
for  classification  as  described  in  the  next  section. 

C.  Support  Vector  Machine  Classification 

A  classifier  learning  algorithm  takes  a  training  set  as 
input  and  produces  a  classifier  as  its  output.  In  our 
experiment,  a  training  set  is  a  collection  of  candidate  volumes 
that  a  radiologist  has  individually  labeled  as  polyps  or  non¬ 
polyps.  A  Support  Vector  Machine  classifier  finds  the  best 
discriminating  hypersurface  between  the  two  classes  in  the 
training  set,  i.e.  polyps  and  non-polyps  [8].  This  hypersurface 
not  only  classifies  the  two  subsets  correctly  but  also 
maximizes  the  margin  between  the  closest  points.  The  points 
closest  to  the  hyper  surf  ace  are  termed  the  “support  vectors”. 
This  problem  of  finding  the  optimal  hyper  surf  ace  is 
illustrated  to  be  a  quadratic  programming  problem  in  [8]. 
Suppose  that  vector  xt  in  the  training  set  is  given  (by  the 
radiologist)  a  label  yt  =1  if  it  is  a  polyp  and  yt  =-l  if  it  is  not. 
Then  the  optimal  classifier  has  the  form: 

f(x)=YJaiyiK(Xi,x)  +  b  (1) 

SV's 

Where  SV  denotes  the  set  of  support  vectors,  K  is  a  kernel 
function  used  to  transform  the  data  from  its  original  space  to 


a  higher  space  where  it  is  easier  to  discriminate  the  two 
classes,  and  at  and  h  are  constants  computed  by  the  classifier¬ 
learning  algorithm. 

SVMs  minimize  the  structural  risk,  given  as  the 
probability  of  misclassifying  previously  unseen  data.  In 
addition,  support  vectors  are  inherently  the  vectors  that  carry 
the  differentiating  characteristics  of  each  class.  Identification 
of  these  vectors  exploits  all  of  the  information  in  the  training 
set  optimally,  and  eliminates  the  guess  work  from  the  task  of 
defining  appropriate  discrimination  criteria. 

III.  EXPERIMENTS 

We  used  a  data  set  consisting  of  small  candidate  volumes 
from  the  CT  scans  of  subjects  enrolled  in  our  CT 
colonography  study  comprising  30  known  colonic  polyps  and 
212  other  regions  containing  tissue  from  non-polyp  or  normal 
mucosal  surfaces.  These  non-polyp  structures  were  obtained 
by  taking  the  false  positive  detection  outputs  of  the 
algorithms  presented  in  previous  work  [5,7].  These  areas 
typically  represented  thickened  haustral  folds,  convergent 
folds,  or  foci  of  retained  stool. 

100  random  triples  of  perpendicular  images  were 
extracted  from  each  candidate  volume.  A  24- element  vector 
was  obtained  for  each  triple.  These  vectors  incorportate  all 
the  geometric  attributes  from  each  particular  plane.  For 
analysis  purposes,  we  divided  these  attributes  in  to  five 
different  categories:  (1)  {radius  of  best  fitting  circle,  residual 
to  the  best  fitting  circle,  residual  to  best  fitting  line},  (2) 
{moment  invariants},  (3)  {quadric  invariants},  (4)  {intensity 
based  features},  and  (5)  {parallel  line  features}.  We  divided 
our  experiments  into  different  groups  where  subsets  of  these 
five  categories  were  evaluated.  This  way,  one  can  judge  the 
relative  importance  of  each  attribute.  The  optimum  selection 
of  these  geometric  attributes  not  only  would  have 
computational  affectivity  but  also  could  give  valuable 
feedback  to  radiologists. 

The  results  are  summarized  in  Fig.  6.  The  numbering  on 
the  legend  is  the  same  as  the  numbers  of  the  attributes 
mentioned  above.  An  exponential  radial  basis  function  [8] 
was  used  as  kernel  function  with  support  vector  machine 
classifiers.  A  10-fold  cross  validation  was  applied  in  order  to 
obtain  the  results  given  in  Figure  6.  Here,  sensitivity  is  given 
as  the  ratio  of  the  detected  polyps  over  all  of  the  30  polyps. 


False  positive  rate  is  the  ratio  of  uneliminated  false  positives 
in  the  initial  set.  Observe  that,  on  average,  the  false  positive 
rate  occurs  as  35%,  23%,  18%,  15%  and  13%  at  polyp 
detection  sensitivity  levels  of  100%,  95%,  90%,  85%,  and 
80%  respectively.  The  combination  of  attributes  {1,  2,  3  and 
5}  served  as  the  optimal  set  in  these  initial  experiments.  In 
geometric  terms,  these  results  show  that  searching  for  circular 
and  elliptical  structures  in  the  image,  then  searching  for 
parallel  lines  and  3rd  order  curved  structures  is  the  most 
discriminating  strategy  for  polyp  detection.  When  these 
planar  attributes  are  used,  the  false  positive  rate  can  be 
further  reduced  to  28%,  15%,  13%,  9%  and  8%  for  sensitivity 
levels  of  100%,  95%,  90%,  85%,  and  80%  respectively. 

In  the  current  experimental  design,  we  search  for  the 
optimum  set  of  planar  attributes  by  trial  and  error,  i.e.,  by 
searching  the  space  of  all  possibilities  one  by  one.  While  the 
current  results  are  considerably  improved  compared  with  our 
earliest  implemenation,  we  desire  more  targeted  methods  to 
replace  the  exhaustive  search  strategy  for  obtaining  the 
optimum  choice  and  combination  of  attributes. 

IV.  CONCLUSION 

Virtual  colonoscopy  is  a  promising  new  medical  imaging 
technique  to  evaluate  the  human  colon  for  precancerous 
polyps.  Due  to  the  large  amount  of  radiologist  time  involved 
in  reviewing  hundreds  of  images  in  a  search  for  small  lesions, 
computer  aided  diagnosis  is  necessary  to  make  the  approach 
efficient  and  cost-effective.  Previous  automated  detection 
methods  had  a  high  sensitivity  for  polyp  detection,  but  relied 
on  human  observations  to  differentiate  polyps  from  normal 
folds  or  retained  fecal  material.  To  be  more  accurate,  we  need 
a  method  that  is  capable  of  differentiating  polyps  from  other 
normal  healthy  structures  in  the  colon.  In  this  study,  we 
proposed  a  learning  approach  that  yields  a  good  polyp 
detection  rate  with  a  reasonable  number  of  false  positives, 
thereby  showing  the  feasibility  of  computer-based  screening. 
One  of  the  main  contributions  of  the  paper  is  the  new  3-D 
pattern  analysis  approach,  which  combines  the  information  of 
planar  attributes  from  many  random  images  to  generate 
reliable  shape  signatures.  We  present  an  experimental  setup 
to  obtain  the  best  combination  of  these  planar  attributes  by 
exhaustive  search.  We  also  show  that  the  use  of  support 
vector  machines  allows  to  implicitly  distinguish  the 
differentiating  characteristics  of  polyps  and  healthy  tissue, 
thus  improving  classification  rates. 

There  are  many  possible  directions  for  future 
investigation.  First,  we  would  like  to  analyze  support  vectors 
to  observe  the  differentiating  characteristics  of  polyps  and 
healthy  tissue.  Using  the  features  in  these  support  vectors,  we 
would  like  to  find  an  automated  way  to  obtain  the  optimum 
combination  of  planar  features  to  be  used  in  polyp  detection. 
In  addition,  studies  integrating  these  computer-aided 
detection  schemes  with  radiologist  readers  will  be  used  to 
measure  potential  improvements  in  sensitivity  and  efficiency 
compared  with  unassisted  radiologist  interpretation. 
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